Chris Brown, Univeristy of Sydney
60min including questions / interaction
Questionnaire (5 mins)
Software (20 mins)
Statistical tests (30 mins)
Survey analysis / Questions (5 mins)
1999 - 2002: BSc (Maths / Stats), University of Sydney
2002 - 2005: SPSS (Technical Support)
2004 - 2007: Masters Biostatistics
2005 - Now: Clinical Trials Centre, University of Sydney
2013 - 2015: Cancer Registry Ireland
2016 - Now: Bean Bar You (Chocolate Subscription)
Currently downloading and processing your results…
Characteristic |
N = 16 1 |
|---|---|
| SAS | 0 (0%) |
| SPSS | 16 (100%) |
| Stata | 3 (19%) |
| R | 3 (19%) |
| Python | 0 (0%) |
| Git | 0 (0%) |
| REDCap | 0 (0%) |
| Other | 0 (0%) |
| 1
n (%) |
|
SAS
SPSS
Stata
R
Python
Other common tools:
REDCap
Git
https://www.kdnuggets.com/2010/06/software-popularity-of-data-analysis-software.html
Powerful / reliable / just works
Was the “standard” for pharmaceutical industry
Driven with code (programming)
Good sample size program (but is complicated)
Expensive / often available via Universities
https://www.ibm.com/products/spss-statistics
GUI well developed
Code is more integrated than in SPSS
Output text based but can now create word doc / pdf / html
USD ~160-510
Open source (rebuild of a software called S+) = Free
Code based (People have created GUIs)
Huge community, great integrations (
R-Studio (Posit) fostered the “Tidyverse” which
Packages / can get messy / easy to break things
Markdown (Quarto) / Shiny = Game changers!
Not a traditional statistical software package
Pandas / NumPy / Jupyter notebooks -> data science
Powerful, open source, huge community, AI
Open-source = FREE
Packages / can get messy / easy to break things
Version control text files (i.e. code)
Record changes over time + explanation of why
Able to roll-back to and old version
Very useful if collaborating with others
https://betterexplained.com/articles/a-visual-guide-to-version-control/
| H0 true | H1 true | |
|---|---|---|
| Fail to reject H0 | Correct | Type 2 (\(\beta\)) |
| Reject H0 | Type 1 (\(\alpha\)) | Correct (Power = 1-\(\beta\)) |
https://heyspinner.com/random-number-wheel/1-2
Binary
Categorical (ordered/unordered)
Continuous
Time-to-event
Comparing proportions
Comparing continuous
Comparing time-to-event
From R’s “survival” package: “cancer” dataset
Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.
| Variable | Description |
|---|---|
| inst | Institution code |
| time | Survival time in days |
| status | censoring status 1=censored, 2=dead |
| age | Age in years |
| sex | Male=1 Female=2 |
| ph.ecog | ECOG as rated by the physician. |
| ph.karno | KPS (bad=0, good=100) rated by physician |
| pat.karno | rated by patient |
| meal.cal | Calories consumed at meals |
| wt.loss | Weight loss in last six months (pounds) |
Two-sample comparison of proportions power calculation
n = 14
p1 = 0.2
p2 = 0.7051759
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
tbl_summary(cancer_clean,
include = c(age, meal.cal, wt.loss),
statistic = list(all_continuous() ~ "{mean} ({sd})" ),
by=sex
) %>%
add_p(test=list(all_continuous() ~ "t.test"))Characteristic |
1 |
2 |
p-value 2 |
|---|---|---|---|
| age | 63 (9) | 61 (9) | 0.064 |
| meal.cal | 981 (413) | 841 (369) | 0.020 |
| Unknown | 24 | 23 | |
| wt.loss | 11 (13) | 8 (13) | 0.060 |
| Unknown | 10 | 4 | |
| 1
Mean (SD) |
|||
| 2
Welch Two Sample t-test |
|||
Regression
UV
MV
Repeated measures / clustering
Time to event
Cox-proportional hazards
Competing risks
Can appear as curves “crossing” or “diverging”
If so, a single number (hazard ratio) may not be the appropriate summary
Call:
coxph(formula = Surv(time, status) ~ sex, data = cancer)
n= 228, number of events= 165
coef exp(coef) se(coef) z Pr(>|z|)
sex -0.5310 0.5880 0.1672 -3.176 0.00149 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 0.588 1.701 0.4237 0.816
Concordance= 0.579 (se = 0.021 )
Likelihood ratio test= 10.63 on 1 df, p=0.001
Wald test = 10.09 on 1 df, p=0.001
Score (logrank) test = 10.33 on 1 df, p=0.001
Characteristic |
HR 1 |
95% CI 1 |
p-value |
|---|---|---|---|
| sex | 0.59 | 0.42, 0.82 | 0.001 |
| 1
HR = Hazard Ratio, CI = Confidence Interval |
|||
Characteristic |
N |
HR 1 |
95% CI 1 |
p-value |
|---|---|---|---|---|
| sex | 228 | 0.001 | ||
| male | — | — | ||
| female | 0.59 | 0.42, 0.82 | ||
| age | 228 | 1.02 | 1.00, 1.04 | 0.039 |
| age_10 | 228 | 1.21 | 1.01, 1.44 | 0.039 |
| ph.ecog | 227 | 1.61 | 1.29, 2.01 | <0.001 |
| ecog | 227 | <0.001 | ||
| 0 | — | — | ||
| 1 | 1.45 | 0.98, 2.13 | ||
| 2 | 2.50 | 1.61, 3.88 | ||
| 3/4 | 9.10 | 1.22, 67.9 | ||
| ph.karno_10 | 227 | 0.85 | 0.76, 0.95 | 0.006 |
| meal.cal_1000 | 181 | 0.88 | 0.56, 1.39 | 0.6 |
| wt.loss_10 | 214 | 1.01 | 0.90, 1.14 | 0.8 |
| 1
HR = Hazard Ratio, CI = Confidence Interval |
||||
Characteristic |
HR 1 |
95% CI 1 |
p-value |
|---|---|---|---|
| sex | |||
| male | — | — | |
| female | 0.58 | 0.39, 0.85 | 0.006 |
| age_10 | 1.13 | 0.90, 1.42 | 0.3 |
| ecog | |||
| 0 | — | — | |
| 1 | 1.85 | 1.07, 3.20 | 0.026 |
| 2 | 4.81 | 2.06, 11.2 | <0.001 |
| 3/4 | 16.0 | 1.78, 143 | 0.013 |
| ph.karno_10 | 1.22 | 0.98, 1.52 | 0.076 |
| meal.cal_1000 | 0.97 | 0.58, 1.60 | 0.9 |
| wt.loss_10 | 0.89 | 0.76, 1.03 | 0.12 |
| 1
HR = Hazard Ratio, CI = Confidence Interval |
|||
Consider your context and objective
Backwards / forwards selection
Best subset
LASSO regression
Want to learn more? Suggest Frank Harrell, Regression Modelling Strategies. https://hbiostat.org/rmsc/
Mixed effects logistic regression
Generalised estimating equations (GEE)
Use any package (one you can get support with)
Have a reproducible mindset (make life easy)
Use version control (keep things tidy)
Write comments for “future you” / others
https://docs.github.com/en/copilot
Aim to be able to run from source program to output report without manual intervention:
You won’t forget how to run / update things (+5 years)
Someone else can run it if they want to
You can’t make copy/paste / typing errors
Automatic updates (if you the data changes)
My poster: Embedding reproducible research principals in clinical trial analyses
Please complete the 2nd part of the survey now… I really appreciate your feedback (use QR only if lost the page)
Want to develop an trial idea into a full protocol in 6 days?
Consider an ACORD protocol development workshop
https://www.moga.org.au/2026-acord-workshop
Any questions?